class: center, middle, inverse, title-slide .title[ # Lecture 27 ] .subtitle[ ## Generalized Linear Models: simple logistic regression ] .author[ ### Psych 10 C ] .institute[ ### University of California, Irvine ] .date[ ### 06/03/2022 ] --- ## Null model - Last class we went over the null model in a simple logistic regression setting. -- - According to the model, the probability of testing positive on a "perfect" test is equal for all participants regardless of the population that they belong to. -- - The Null model is formalized as follows: `$$y_i \sim \text{Bernoulli}\left(\theta\right)$$` -- - Were `\(\theta\)` can be found from the definition of the logarithm of the odds: `$$ln\left(\frac{\theta}{1-\theta}\right) = \beta_0$$` `$$\implies \theta = \frac{e^{\beta_0}}{1+e^{\beta_0}}$$` --- ## Null model - Remember that once we get a value for `\(\beta_0\)` using the **`glm()`** function in R, `\(e^{\beta_0}\)` will just be a number between `\(0\)` and `\(\infty\)`. -- - For example, for the covid data we found that the estimated value of `\(\beta_0\)` denoted by `\(\hat{\beta_0}\)` was equal to -1.73 which means that the probability is equal to: `$$\frac{e^{-1.73}}{1+e^{-1.73}} = \frac{0.18}{1+0.18} = 0.15$$` -- - In other words the model predicts that the probability of a positive case is equal to 0.15. -- - If we think about this in terms of the total number of infected participants in the sample, the model predicts that we should see 12 in the population. - However, this means that we should see approximately 6 in each group, which is less than the observed ones in non vaccinated population and more than the vaccinated population. --- ## Simple logistic regression model - Now we need to introduce the simple logistic regression model. -- - This will be similar to the simple linear regression model that we talked about before. -- - The main assumption of the model is that the probability of testing positive for covid is different depending on the population that you belong to. -- - First we have to assign a numeric value to our population, and as before, the population that we assign to the value `\(0\)` will be consider as the base-line. -- - We will denote the indicator for vaccination status as: `$$z_i = \begin{cases} 0 & \quad \text{if vaccination status } = \text{ not vaccinated}\\ 1 & \quad \text{if vaccination status } = \text{ vaccinated} \end{cases}$$` -- ```r covid <- covid %>% mutate("vaccinated_id" = case_when(status == "not_vaccinated" ~ 0, status == "vaccinated" ~ 1)) ``` --- # Simple logistic regression model - Now we have an indicator function `\(z_i\)` that takes the value `\(0\)` if the participant belongs to the not vaccinated group and it takes the value `\(1\)` if the participant belongs to the vaccinated group. -- - The simple logistic regression model assumes that the probability of testing positive for covid is a function of vaccination status. -- - We can formalize the model as follows: `$$y_i \sim \text{Bernoulli}\left(\theta_i\right)$$` - where `\(\theta_i\)` represents the probability of testing positive for the *i-th* participant. -- - This time the probability has an index `\(i\)` because it will be different depending on whether the participant belongs to the not vaccinated or the vaccinated population. --- ## From log-odds to probabilities - Now, remember that our linear function is defined on the logarithm of the odds as follows: `$$\ln\left(\frac{\theta_i}{1-\theta_i}\right) = \beta_0 + \beta_1z_i$$` -- - If we take the same steps as with the Null model, then we can define the probability that the *i-th* participant will test positive as; `$$\theta_i = \frac{e^{\beta_0 + \beta_1z_i}}{1+e^{\beta_0 + \beta_1z_i}}$$` -- - Notice that the only difference between this model and the Null, is that we now have an additional `\(\beta_1z_i\)` in the exponent; this part of the equation represents the difference between the baseline population (not vaccinated) and the vaccinated participants. --- ## Simple logistic regression - Now that we have specified the model, we can look at how the parameters can be interpreted. -- - In this case, `\(\beta_0\)` represents the logarithm of the odds for the base line population, in our example not vaccinated participants. -- - However, the log-odds are difficult to interpret so we can use a transformation. In this case, `\(e^{\beta_0}\)` can be interpreted as how many times more probable it is to test positive rather than negative in the baseline population. -- - The value of `\(\beta_0+\beta_1\)` represents the logarithm of the odds ratio for the vaccinated population. -- - Again, it will be easier to interpret a transformation of this sum. In this case, `\(e^{\beta_0+\beta_1}\)` represents how many times more probable it is to test positive rather than negative in the vaccinated population. -- - The parameter `\(\beta_1\)` can be interpreted as the change in the log-odds associated with being part of the vaccinated population. --- ## Odds ratio - Again, interpretations in terms of log-odds are difficult to understand in the context of an experiment. However, this time, interpreting the transformation of `\(\beta_1\)` is not as easy because it involves the comparison between two "rates of change". -- - The value of `\(e^{\beta_1}\)` is also known as the odds ratio. In our example, it represents how more/less probable it is to test positive rather than negative on the vaccinated population, in comparison to how more/less probable it is to test positive than negative in the not vaccinated population. -- - If `\(0<e^{\beta_1}<1\)` it means that it would be more likely to test positive rather than negative for a participant that belongs to the not vaccinated population. This will happen if `\(\beta_1<0\)`. -- - If `\(e^{\beta_1}=1\)` it means that it is equally probable to test positive rather than negative regardless of the population that the participant belongs to. This will happen if `\(\beta_1 = 0\)`. -- - Finally, if `\(1<e^{\beta_1}\)` it means that it is more probable for a participant to test positive rather than negative if they belong to the vaccinated population. This will happen if `\(1<\beta_1\)`. --- ## Probability of a positive test - This model also allows us to define two probabilities of a positive test, one for each value of `\(z_i\)`. -- - The probability of testing positive is equal to: .pull-left[ Not vaccinated group `$$\theta_{\text{not vaccinated}} = \frac{e^{\beta_0}}{1+e^{\beta_0}}$$` ] .pull-right[ Vaccinated group `$$\theta_{\text{vaccinated}} = \frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}}$$` ] -- - If we take the ratio between this two values we can get the relative risk. The relative risk represents how much more probable it is to test positive in the vaccinated population in comparison to the not vaccinated population. -- - The relative risk can be expressed as: `$$RR = \frac{\theta_{\text{vaccinated}}}{\theta_{\text{not vaccinated}}} = \frac{\frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}}}{\frac{e^{\beta_0}}{1+e^{\beta_0}}}$$` --- ## Obtaining the values of the parameters - Now that we have the values that we might be interested in we can obtain our estimates for `\(\beta_0\)` and `\(\beta_1\)` using the **`glm()`** function in R. -- ```r betas_lr <- glm(formula = test ~ vaccinated_id, data = covid, family = "binomial")$coef ``` -- - The estimated value of `\(\beta_0\)` was -1.1, while the estimated value of `\(\beta_1\)` was equal to -1.85. -- .pull-left[ Estimated probability of a positive test rather than a negative one for the not vaccinated population. `$$e^{\hat{\beta}_0} = e^{-1.1} = 0.33$$` we can also use: `$$\frac{1}{e^{\hat{\beta}_0}} = \frac{1}{e^{1.1}} = 3$$` ] .pull-right[ Estimated probability of a positive test rather than a negative one for the vaccinated population. `$$e^{\hat{\beta}_0+\hat{\beta}_1} = e^{-2.94} = 0.05$$` we can also use: `$$\frac{1}{e^{\hat{\beta}_0+\hat{\beta}_1}} = \frac{1}{e^{-2.94}} = 19$$` ] --- ## Odds ratio, Relative Risk and Probability - With the estimated values of `\(\beta_0\)` and `\(\beta_1\)` we can also compute and interpret the odds ratio, the relative risk and the probability of a positive test in each group: -- Odds ratio: `$$\text{OR} = \frac{\frac{\theta_{\text{vaccinated}}}{1-\theta_{\text{vaccinated}}}}{\frac{\theta_{\text{not vaccinated}}}{1-\theta_{\text{not vaccinated}}}}=\cdots \text{algebra}\cdots = e^{\hat{\beta}_1} = e^{-1.85} = 0.16$$` we can also have: `$$\frac{1}{OR} = \frac{1}{e^{\hat{\beta}_1}} = \frac{1}{e^{-1.85}} = 6.33$$` - This means that it is 6 times more probable to test positive rather than negative in the not vaccinated group in comparison to the vaccinated group. --- ## Odds ratio, Relative Risk and Probability - To get the probability of testing positive in each population we can use the following formulas: -- - The probability of testing positive for the not vaccinated group is equal to: `$$\theta_{\text{not vaccinated}} = \frac{e^{\hat{\beta}_0}}{1+e^{\hat{\beta}_0}} = \frac{e^{-1.1}}{1+e^{-1.1}} = \frac{0.33}{1+0.33} = 0.25$$` -- - The probability of testing positive for the vaccinated group is equal to: `$$\theta_{\text{vaccinated}} = \frac{e^{\hat{\beta}_0+\hat{\beta}_1}}{1+e^{\hat{\beta}_0+\hat{\beta}_1}} = \frac{e^{-2.94}}{1+e^{-2.94}} = \frac{0.05}{1+0.05} = 0.05$$` - With those values we can also obtain the Relative Risk: `$$RR = \frac{\theta_{\text{vaccinated}}}{\theta_{\text{not vaccinated}}} = \frac{0.05}{0.25} = 0.11$$` --- ## Odds ratio, Relative Risk and Probability - It is easy to interpret values of the Relative Risk if they are higher than 1, so we can take the inverse of the Relative Risk and compare the probability of the not vaccinated group to the vaccinated one. -- `$$RR = \frac{\theta_{\text{not vaccinated}}}{\theta_{\text{vaccinated}}} = \frac{0.25}{0.05} = 4.51$$` -- - This result suggests that the Risk of testing positive for covid is 4.5 times higher in the not vaccinated group in comparison to the vaccinated group. -- - Keep in mind that this result comes from the particular sample that we have and there will be error. -- - Actually, the value I used to simulate the data is around 2, which is smaller than the current estimates (2.5 in people 18 or older). -- - This highlights the importance of large samples in this kind of context. The more observations we have the more confident we can be in our results. --